49 research outputs found
Relation Networks for Object Detection
Although it is well believed for years that modeling relations between
objects would help object recognition, there has not been evidence that the
idea is working in the deep learning era. All state-of-the-art object detection
systems still rely on recognizing object instances individually, without
exploiting their relations during learning.
This work proposes an object relation module. It processes a set of objects
simultaneously through interaction between their appearance feature and
geometry, thus allowing modeling of their relations. It is lightweight and
in-place. It does not require additional supervision and is easy to embed in
existing networks. It is shown effective on improving object recognition and
duplicate removal steps in the modern object detection pipeline. It verifies
the efficacy of modeling object relations in CNN based detection. It gives rise
to the first fully end-to-end object detector
Multi-view PointNet for 3D Scene Understanding
Fusion of 2D images and 3D point clouds is important because information from
dense images can enhance sparse point clouds. However, fusion is challenging
because 2D and 3D data live in different spaces. In this work, we propose
MVPNet (Multi-View PointNet), where we aggregate 2D multi-view image features
into 3D point clouds, and then use a point based network to fuse the features
in 3D canonical space to predict 3D semantic labels. To this end, we introduce
view selection along with a 2D-3D feature aggregation module. Extensive
experiments show the benefit of leveraging features from dense images and
reveal superior robustness to varying point cloud density compared to 3D-only
methods. On the ScanNetV2 benchmark, our MVPNet significantly outperforms prior
point cloud based approaches on the task of 3D Semantic Segmentation. It is
much faster to train than the large networks of the sparse voxel approach. We
provide solid ablation studies to ease the future design of 2D-3D fusion
methods and their extension to other tasks, as we showcase for 3D instance
segmentation.Comment: Geometry Meets Deep Learning Workshop, ICCV 201
The Value of Autofluorescence Bronchoscopy Combined with White Light Bronchoscopy Compared with White Light Alone in the Diagnosis of Intraepithelial Neoplasia and Invasive Lung Cancer: A Meta-Analysis
ObjectiveTo compare the accuracy of autofluorescence bronchoscopy (AFB) combined with white light bronchoscopy (WLB) versus WLB alone in the diagnosis of lung cancer.MethodsThe Ovid, PubMed, and Google Scholar databases from January 1990 to October 2010 were searched. Two reviewers independently assessed the quality of the trials and extracted data. The relative risk for sensitivity and specificity on a per-lesion basis of AFB + WLB versus WLB alone to detect intraepithelial neoplasia and invasive cancer were pooled by Review Manager.ResultsTwenty-one studies involving 3266 patients were ultimately analyzed. The pool relative sensitivity on a per-lesion basis of AFB + WLB versus WLB alone to detect intraepithelial neoplasia and invasive cancer was 2.04 (95% confidence interval [CI] 1.72–2.42) and 1.15 (95% CI 1.05–1.26), respectively. The pool relative specificity on a per-lesion basis of AFB + WLB versus WLB alone was 0.65 (95% CI 0.59–0.73).ConclusionsAlthough the specificity of AFB + WLB is lower than WLB alone, AFB + WLB seems to significantly improve the sensitivity to detect intraepithelial neoplasia. However, this advantage over WLB alone seems much less in detecting invasive lung cancer
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion
Recent advancements in open-world 3D object generation have been remarkable,
with image-to-3D methods offering superior fine-grained control over their
text-to-3D counterparts. However, most existing models fall short in
simultaneously providing rapid generation speeds and high fidelity to input
images - two features essential for practical applications. In this paper, we
present One-2-3-45++, an innovative method that transforms a single image into
a detailed 3D textured mesh in approximately one minute. Our approach aims to
fully harness the extensive knowledge embedded in 2D diffusion models and
priors from valuable yet limited 3D data. This is achieved by initially
finetuning a 2D diffusion model for consistent multi-view image generation,
followed by elevating these images to 3D with the aid of multi-view conditioned
3D native diffusion models. Extensive experimental evaluations demonstrate that
our method can produce high-quality, diverse 3D assets that closely mirror the
original input image. Our project webpage:
https://sudo-ai-3d.github.io/One2345plus_page
Multi-site, Multi-domain Airway Tree Modeling (ATM'22): A Public Benchmark for Pulmonary Airway Segmentation
Open international challenges are becoming the de facto standard for
assessing computer vision and image analysis algorithms. In recent years, new
methods have extended the reach of pulmonary airway segmentation that is closer
to the limit of image resolution. Since EXACT'09 pulmonary airway segmentation,
limited effort has been directed to quantitative comparison of newly emerged
algorithms driven by the maturity of deep learning based approaches and
clinical drive for resolving finer details of distal airways for early
intervention of pulmonary diseases. Thus far, public annotated datasets are
extremely limited, hindering the development of data-driven methods and
detailed performance evaluation of new algorithms. To provide a benchmark for
the medical imaging community, we organized the Multi-site, Multi-domain Airway
Tree Modeling (ATM'22), which was held as an official challenge event during
the MICCAI 2022 conference. ATM'22 provides large-scale CT scans with detailed
pulmonary airway annotation, including 500 CT scans (300 for training, 50 for
validation, and 150 for testing). The dataset was collected from different
sites and it further included a portion of noisy COVID-19 CTs with ground-glass
opacity and consolidation. Twenty-three teams participated in the entire phase
of the challenge and the algorithms for the top ten teams are reviewed in this
paper. Quantitative and qualitative results revealed that deep learning models
embedded with the topological continuity enhancement achieved superior
performance in general. ATM'22 challenge holds as an open-call design, the
training data and the gold standard evaluation are available upon successful
registration via its homepage.Comment: 32 pages, 16 figures. Homepage: https://atm22.grand-challenge.org/.
Submitte